Faster RCNN论文笔记

本笔记中会多次提到Fast RCNN的架构，对Fast RCNN的概念与架构有疑问的可以参考

或者我的Fast RCNN笔记

从RCNN到Fast RCNN，所有的detection任务都使用了selective search来提取region proposal，因此诞生了用神经网络来提取region proposal的方法，进一步提升检测速度的同时还提升了检测的准确率.

Motivation

Fast RCNN ignores the time spent on region proposals
Region proposal methods used in research are implemented on the CPU
The feature maps can also be used for generating region proposals 简单说就是提取proposal的步骤太花时间，而且作者希望尽可能用GPU完成所有的计算。

Region Proposal Networks (RPN)

Basic Notions

Fully-convolutional network for generating region proposals
Share computation of convolutions with start-of-art object detection networks

Architecture

RPN与卷积层共享权重，也就是说RPN的输入就是最后一个卷积层的输出的feature map，得到输入后使用滑动窗口的方式得到更低维的向量
将得到的向量输入到两个并联的全连接层
1. box-regression layer (bounding box regressor)
2. box-classification layer (objectness)

类似于Fast RCNN的RoI pooling层后面的架构，RPN输出的feature用来训练两个分类器，一个用来判断这个RoI是否包含object，另一个用来做bounding box回归(即给定一个anchor判断bounding box的位置)

每个滑动窗口中心记作一个anchor，对应9种variant，相当于对于每一个feature map用9种sliding window计算vector，后面的输出也变成9倍，这么做是为了对图像的transformation有更好的鲁棒性
Loss Function的定义与Fast RCNN类似

\[ L(p_{i},t_{i}) = L_{cls}(p_{i},{p_{i}}^{*}) + \lambda{p_{i}}^{*}L_{reg}(t_{i},{t_{i}}^{*}) \]

由两部分构成：regression的Loss和classification的Loss，通过\(\lambda\)控制两类Loss的比重公式中\(p_{i}\)表示一个region中包含object的概率，\({p_{i}}^{*}\)表示ground truth，是一个0-1指示器，当某个anchor被标注为含有一个object时它的值才是1，否则为0, \(L_{cls}(p_{i},{p_{i}}^{*})\)的定义方式与Fast RCNN中相同

Training RPN

mini batch构成
1. 随机选取一张图片的256个anchor，正负样本为1:1
2. 用高斯分布初始化权重
Trainging的四个步骤这里引用一下原文:
1. First, train the RPN is initialized with an ImageNet-pre-trained model and fine-tuned end-to-end for the region proposal task.
2. In the second step, we train a separate detection network by Fast R-CNN using the proposals generated by the step-1 RPN. This detection network is also initialized by the ImageNet-pre-trained model.
3. In the third step, we use the detector network to initialize RPN training, but we fix the shared conv layers and only fine-tune the layers unique to RPN.
4. Now the two networks share conv layers. Finally, keeping the shared conv layers fixed, we fine-tune the fc layers of the Fast R-CNN.

总体而言训练方式在训练Fast RCNN的基础上做了改进，Fast RCNN输入的region proposal由RPN提供，之后用Fast RCNN的权重重新初始化RPN的参数，做到权重共享，保持这些权重不变，只对RPN中的几层进行微调，最后再对全连接层进行微调。

这样做的好处显而易见，实时检测时，前面卷积层提取feature map直接送给RPN， Fast RCNN等待RPN计算proposal，用GPU计算proposal速度很快，与Fast RCNN不同的是不需要selective search，提取proposal的步骤与detection合并，在原先的卷积层与RoI pooling层之间加入了一个RPN提取proposal，之后Fast RCNN的全连接层利用RPN的输出向量进行detection

笔者认为从RCNN到这里的Faster RCNN，有一点bounding box regression和object classification的joint learning的味道了。

最后RPN结合Fast RCNN可以做到单纯用神经网络进行object detection，不依靠selective search这样的low-level feature的方法，完全依靠deep learning进行图像的理解。

不过有时间还是要把selective search也看一下，毕竟这篇论文有很高的引用量。